adds mmlu-pro #1031

NathanHB · 2025-10-31T13:13:53Z

to run:

lighteval endpoint inference-providers "model_name=openai/gpt-oss-20b,provider=hyperbolic,generation_parameters={max_new_tokens:8192}" "lighteval|mmlu_pro|0" --save-details

Copilot

Pull Request Overview

This PR adds support for the MMLU Pro benchmark, a multiple-choice question answering task from the TIGER-Lab/MMLU-Pro dataset.

Introduces a new MMLU Pro task configuration
Implements a custom prompt function for MMLU Pro questions
Configures evaluation on the test split with validation for few-shots

Comments suppressed due to low confidence (8)

src/lighteval/tasks/tasks/mmlu_pro.py:74

The task configuration is missing the generation_size parameter, which is required for generative metrics like gpqa_instruct_metric. Based on similar tasks using this metric (e.g., gpqa.py lines 57, 73, 89), a value like generation_size=30 or generation_size=32768 should be specified depending on whether reasoning traces are expected.
src/lighteval/tasks/tasks/mmlu_pro.py:74
The task configuration is missing the stop_sequence parameter. Based on the generative nature of the task and similar configurations (e.g., gpqa.py lines 59, 75, 91), stop_sequence=[] should be explicitly set to use the EOS token.
src/lighteval/tasks/tasks/mmlu_pro.py:23
Import of 'LogLikelihoodAccMetric' is not used.

https://arxiv.org/abs/2406.01574
"""
from string import ascii_uppercase

src/lighteval/tasks/tasks/mmlu_pro.py:25

Import of 'LogProbCharNorm' is not used.
Import of 'LogProbPMINorm' is not used.
Import of 'LogProbTokenNorm' is not used.

from lighteval.metrics.metrics import Metrics

src/lighteval/tasks/tasks/mmlu_pro.py:27

Import of 'get_metrics_for_formulation' is not used.

from lighteval.tasks.requests import Doc

src/lighteval/tasks/tasks/mmlu_pro.py:29

Import of 'get_mcq_prompt_function' is not used.
src/lighteval/tasks/tasks/mmlu_pro.py:34
Import of 'CFFormulation' is not used.
Import of 'HybridFormulation' is not used.
Import of 'MCFFormulation' is not used.

TEMPLATE = """
Answer the following multiple choice question. The last line of your response should be of the following format: 'Answer: $LETTER' (without quotes) where LETTER is one of ABCD. Think step by step before answering.

{question}

src/lighteval/tasks/tasks/mmlu_pro.py:35

Import of 'Language' is not used.

{choices}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/lighteval/tasks/tasks/mmlu_pro.py

HuggingFaceDocBuilderDev · 2025-10-31T13:18:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

adds mmlu-pro

d8fe0d5

NathanHB requested a review from Copilot October 31, 2025 13:13

adds mmlu-pro

9c611aa

Copilot AI reviewed Oct 31, 2025

View reviewed changes

src/lighteval/tasks/tasks/mmlu_pro.py Outdated Show resolved Hide resolved

src/lighteval/tasks/tasks/mmlu_pro.py Show resolved Hide resolved

src/lighteval/tasks/tasks/mmlu_pro.py Show resolved Hide resolved

NathanHB mentioned this pull request Nov 4, 2025

How to evaluate MMLU-Pro #1028

Open

NathanHB added 2 commits November 4, 2025 11:42

Merge remote-tracking branch 'origin/main' into nathan-add-mmlu-pro

69f39ff

add mmlu-pro with inspectai

1fc7009

NathanHB merged commit fa4860f into main Nov 4, 2025
5 checks passed

NathanHB added the new-task label Nov 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

adds mmlu-pro #1031

adds mmlu-pro #1031

NathanHB commented Oct 31, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

adds mmlu-pro #1031

adds mmlu-pro #1031

Conversation

NathanHB commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NathanHB commented Oct 31, 2025 •

edited

Loading